# Efficient inference optimization
Helium 1 2b Q8 0 GGUF
This is a GGUF format model converted from kyutai/helium-1-2b, supporting multiple European languages.
Large Language Model Supports Multiple Languages
H
NikolayKozloff
53
3
Qwen3 0.6B Base
Apache-2.0
Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi Qianwen series, offering a range of dense models and Mixture of Experts (MoE) models.
Large Language Model
Transformers

Q
unsloth
10.84k
2
Bitnet B1.58 2B 4T GGUF
MIT
A 1.58-bit quantized large language model developed by Microsoft, designed for efficient inference, offering IQ2_BN and IQ2_BN_R4 quantization versions
Large Language Model
B
tdh111
1,058
4
GLM Z1 9B 0414 Q4 K M GGUF
MIT
This model is a GGUF format conversion of THUDM/GLM-Z1-9B-0414, supporting Chinese and English text generation tasks.
Large Language Model Supports Multiple Languages
G
Aldaris
205
2
Hunyuan 7B Instruct 0124
Other
Hunyuan-7B is an open-source large language model released by Tencent. It has the ability to process 256K long texts and uses the Grouped Query Attention (GQA) mechanism, performing excellently among Chinese 7B dense models.
Large Language Model
Transformers English

H
tencent
590
50
Deepseek R1 Distill Llama 70B GGUF
DeepSeek-R1-Distill-Llama-70B is a 70B parameter large language model developed by the DeepSeek team based on the Llama architecture. It is optimized through distillation technology and supports efficient inference and fine-tuning.
Large Language Model English
D
unsloth
11.51k
79
Deepthink Reasoning 7B GGUF
Openrail
The Llamacpp imatrix quantization version of Deepthink-Reasoning-7B, offering multiple quantization types to meet different hardware requirements.
Large Language Model English
D
bartowski
1,180
3
Gemma 2b It Q4 K M GGUF
The GGUF quantized version of the Gemma-2b-it model, suitable for local inference and supporting text generation tasks.
Large Language Model
Transformers

G
codegood
434
1
Jamba V0.1
Apache-2.0
Jamba is a state-of-the-art hybrid SSM-Transformer large language model that combines the advantages of Mamba architecture with Transformer, supporting 256K context length, surpassing models of similar scale in throughput and performance.
Large Language Model
Transformers

J
ai21labs
6,247
1,181
Whisper Telugu Medium
Apache-2.0
Telugu speech recognition model fine-tuned based on OpenAI Whisper-medium, trained on multiple public Telugu ASR datasets
Speech Recognition Other
W
vasista22
228
2
Featured Recommended AI Models